Skip to main content

Gentle Start

Book

  • Understanding Machine Learning from Theory to Algorithm - Shai Shalev-Shwartz and Shai Ben David

Complete Handwritten Notes

Intro

  • Machine Learning is the study of algorithms that can learn from data, gradually imporoving their performamce.
  • Mathematical Statistics is the science of making decision in the face of uncertainty.

Realizable Case

  • Binary classification: (X,Y)=(instance,label)(X, Y) = (instance, label)
  • Example: Ximage,Y{+1,1}X - image, Y \in \{+1, -1\} (eg: "cat", "dog"), XSX \in S - set of all possible instances

Statistical Learning

  • We will assume that (X,Y)(X, Y) is random, in other words, it has a probability distribution PP, so we use language of probability theory.

Supervised Learning

  • (X,Y)S×{+1,1}(X, Y) \in S \times \{+1, -1\}
  • PP is the distribution of (X,Y)(X,Y) i.e. P(A)=Probability((X,Y)A)P(A) = Probability((X, Y) \in A)
  • Π\Pi is the distribution of XX
  • Imposing the probabilitic model on (X,Y)(X, Y) takes as info realm of Statistical Learning Theory
  • Goal: preduct label YY based on the observation XX
  • The prediction rule is a function g:S{1,+1}g:S\rightarrow\{-1, +1\}
  • The quality of a prediction rule g is measured by the classification/generalization error L(g)=P(Yg(X))L(g) = P(Y \neq g(X))
  • The training data is a sequence (X1,Y1),(X2,Y2),,(Xn,Yn)(X_1, Y_1), (X_2, Y_2), \cdots, (X_n, Y_n) of i.i.d. pairs with distribution PP
  • An algorithm takes training data as input and outputs gn^=gn^((X1,Y1),,(Xn,Yn))\hat{g_n} = \hat{g_n}((X_1, Y_1), \cdots, (X_n, Y_n))
  • In general, we will consider 2 scenarios:
    1. "Realizable" learning: there exists gGs.t.Y=g(x)g \in G\:s.t.\:Y=g^{*}(x) with probability 1.
    2. "Agnostic" learning: there is no gGs.t.Y=g(x)g \in G\:s.t.\:Y=g^{*}(x) with probability 1.

Realizable Scenario:

  • Assume that the set GG of all possible classification rules is finite. By assumption, g(x)\exists g^{*}(x) with prob 1